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Zellner’s y-prior is a popular prior choice for the model selection problems in the context of nor¬ 
mal regression models. Wang and Sun [J. Statist. Plann. Inference 147 (2014) 95-105] recently 
adopt this prior and put a special hyper-prior for g, which results in a closed-form expression 
of Bayes factor for nested linear model comparisons. They have shown that under very general 
conditions, the Bayes factor is consistent when two competing models are of order 0{rU) for 
r < 1 and for r = 1 is almost consistent except a small inconsistency region around the null 
hypothesis. In this paper, we study Bayes factor consistency for nonnested linear models with 
a growing number of parameters. Some of the proposed results generalize the ones of the Bayes 
factor for the case of nested linear models. Specifically, we compare the asymptotic behaviors 
between the proposed Bayes factor and the intrinsic Bayes factor in the literature. 
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1. Introduction 

We reconsider the classical linear regression model 

Y = 1„q; -|-Xp/3p-|-e, (1-1) 

where Y = (yi,... ,ynY is an n-vector of responses, Xp is an n x p design matrix of full 
column rank, containing all potential predictors, 1„ is an n x 1 vector of ones, a is an 
unknown intercept, and f3p is a p-vector of unknown regression coefhcients. Throughout 
the paper, it is assumed that the random error for all models follows the multivariate 
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normal distribution, denoted by e ~ N(On,a'^ln), where 0 „ is an n x 1 vector of zeros, 
cr^ is an unknown positive scalar, and I„ is an n-dimensional identity matrix. Without 
loss of generality, we also assume that the columns of Xp have been centered, so that 
each column has mean zero. 

In the class of linear regression models, we often assume that there is an unknown 
subset of the important predictors which contributes to the prediction of Y or has an 
impact on the response variable Y. This is by natural a model selection problem where we 
would like to select a linear model by identifying the important predictors in this subset. 
Suppose that we have two such linear regression models Mj and M^, with dimensions j 
and i, 


( 1 . 2 ) 

(1.3) 


Mj : Y — InCt + Xj/3j + e. 
Mi: Y = \nO. + + e. 


where X^ is an n x i submatrix of Xp and /3j is an i x 1 vector of unknown regression 
coefficients. As commented by Kass and Raftery [11], a natural way to compare the 
two competing models is the Bayes factor, which has nice model selection consistency 
properties. Here, consistency means that the true model will be eventually selected if 
enough data is provided, assuming that the true model exists. Our particular interest in 
this paper is to study the model selection consistency of Bayes factor when the model 
dimension grows with the sample size. To be more specific, we consider the following 
three asymptotic scenarios: 

Scenario 1. i = and j = with 0 < oi < 02 < 1. 

Scenario 2. i = 0(n“i) and j = ) with 0 < ai < 02 = 1. 

Scenario 3. i = 0{n°‘^) and j = 0{n°‘'^) with oi = 02 = 1. 

When the two models Mi and Mj are nested, Moreno, Giron and Casella [18] study the 
consistency of the intrinsic Bayes factor under the three asymptotic scenarios. Later on, 
Wang and Sun [22] derive an explicit closed-form Bayes factor associated with Zellner’s 
5 -prior for comparing the two models. They show that under very general conditions, 
the Bayes factor is consistent when the two models are of order 0{n'^) for r < 1 and for 
r = 1 is almost consistent except a small inconsistency region around the null hypothesis. 
Such a small set of models around the null hypothesis can be characterized in terms of 
a pseudo-distance between models defined by Moreno and Giron [17]. Finally, Wang and 
Sun [22] compare the proposed results with the ones for the intrinsic Bayes factor due to 


[18]. 


It should be noted that Mi and Mj are not necessarily nested in many practical situ¬ 
ations. As commented by Pesaran and Weeks [20], “m econometric analysis, nonnested 
models arise naturally when rival economic theories are used to explain the same phe¬ 
nomenon, such as unemployment, inflation or output growth.” In fact, the problem of 
comparing nonnested models has been studied in a fairly large body of ecomometric 
and statistical literature from both practical and theoretical viewpoints, dating back to 
[10]. For instance, Gox [4] develops a likelihood ratio testing procedure and shows that 
under appropriate conditions, the proposed approach and its variants have well-behaved 
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asymptotic properties. Watnik and Johnson [25] consider the asymptotic behavior of 
three different testing procedures (the J-test, the JA-test, and the modified Cox test) for 
the analysis of nonnested linear models under the alternative hypothesis. The interested 
reader is referred to [24] and [25] for detailed descriptions of the three testing procedures. 

Giron et al. [7] consider the intrinsic Bayes factor for comparing pairs of nonnested 
models based on the two different encompassing criteria: encompassing from above and 
encompassing from below. Later on, Moreno and Giron [17] present a comparative analysis 
of the intrinsic Bayes factor under the two criteria in linear regression models. Recently, 
Giron et al. [8] study the consistency of the intrinsic Bayes factor for the case of nonnested 
linear models under the first two asymptotic scenarios above. The latter two papers 
mainly focus on the consistency of the intrinsic Bayes factor when the model dimension 
grows with the sample size, whereas under the same asymptotic scenario, the researchers 
should also be interested in the consistency of Bayes factor based on Zellner’s g-prior, 
which is a popular prior choice for the model selection problems in linear regression 
models. To the best of our knowledge, the latter has just received little attention over 
the years, even though it is of the utmost importance to address the consistency issue 
for nonnested models. 

In this paper, we investigate Bayes factor consistency associated with Zellner’s (/-prior 
for the problem of comparing nonnested models under the three asymptotic scenarios 
above. Specifically, we compare the asymptotic results between the proposed Bayes factor 
and the intrinsic Bayes factor due to [8] . The results show that the asymptotic behaviors 
of the two Bayes factors are quite comparable in the first two scenarios. It is remarkable 
that we also study the consistency of the proposed Bayes factor under Scenario 3, whereas 
such a scenario is still an open problem for the intrinsic Bayes factor highlighted by Giron 
et al. [8]. 

The remainder of this paper is organized as follows. In Section 2, we present an explicit 
closed-form expression of Bayes factor based on the null-based approach. In Section 3, we 
address the consistency of Bayes factor for nonnested models under the three asymptotic 
scenarios. Additionally, we compare the proposed results with the ones of the intrinsic 
Bayes factor. An application of the results in Section 3 to the ANOVA models is provided 
in Section 4. Some concluding remarks are presented in Section 5, with additional proofs 
given in the Appendix. 


2. Bayes factor 


Within a Bayesian framework, one of the common ways for the model selection problems 
is to compare models in terms of their posterior probabilities given by 


P(M,|Y) 


p(M,-)p(YlM,) 

E,p(M,)p(Y1M,) 


p{Mj) BF[Mj : Mb] 
E,p(M,)BF[M,:Mf,]’ 


( 2 . 1 ) 


where p{Mj) is the prior probability for model Mj and p{Mj j Y) is the marginal likelihood 
of Y given Mj, and BF[Mj : Mb] is the Bayes factor, which compares each model Mj to 




4 


M. Wang and Y. Maruyama 


the base model M}, and is defined as 


BF[Mj : Mb] 


p{Y\Mby 


( 2 . 2 ) 


The Bayes factor in (2.2) depends on the base model Mb, which is often chosen arbi¬ 
trarily in practical situations. There are two common choices for Mb', one is the null-based 
approach by using the null model (Mq), the other is the full-based approach by choosing 
the full model (Mp). This paper focuses on the null-based approach because (i) the null 
model is commonly used as the base model when using Zellner’s g-priors in most of the 
literature [14] and (ii) unlike the full model, the dimension of the null model is indepen¬ 
dent of the sample size. This is crucial in addressing the consistency of Bayes factor with 
an increasing model dimension. Accordingly, we compare the reducing model Mj with 
Mo: 


Mj : Y = l„a -k + e, (2.3) 

Mfj : Y = InCt -\- E. (2.4) 


Zellner’s g-prior [27] is often to choose the same noninformative priors for the common 
parameters that appear in both models and to assign Zellner’s g-prior for others that 
are only in the larger model. The reasonability of this choice is that if the common 
parameters are orthogonal (i.e., the expected Fisher information matrix is diagonal) to 
the new parameters in the larger model, the Bayes factor is quite robust to the choice 
of the same (even improper) priors for the common parameters; see [12]. Since a and 
are the common orthogonal parameters in (2.3) and (2.4), we consider the following 
prior distributions for 

Mq :p(a,tT^) oc 

\ (2.5) 

Mj :p(a,cr^,/3j) oc — and /3j|cr^ ~ Af( 0 , 5 cr^(X'Xj)”^). 

The amount of information in Zellner’s g-prior is controlled by a scaling factor g, and 
thus the choice of g is quite critical. A nice review of various choices of g-priors was 
provided by Liang et al. [14] and later discussed further by Ley and Steel [13]. In most 
of the developments of the g-priors, the expression of Bayes factor may not have an 
analytically tractable form, so numerical approximations will generally be employed, 
whereas it may not be an easy task for practitioners to choose an appropriate one. 
In particular, standard approximation, such as Laplace approximation, becomes quite 
challenging when the number of parameters grows with the sample size. 

It is remarkable that Maruyama and George [16] propose an explicit closed-form ex¬ 
pression of Bayes factor based on combined use of a generalization of Zellner’s ( 7 -prior 
and the beta-prime prior for g: 


T^ig) 


g\l + g)—» ^ ^ ^ 

B{a + l,b+l) 


2 


( 2 . 6 ) 
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where a > —1, b > —1, and is a beta function. Noting that Zellner’s g-prior is a 

special case of the generalization of Zellner’s g-prior in [16], we obtain the following result 
and the proof directly follows Theorem 3.1 of [16] and is thus omitted for simplicity. 


Theorem 1. Under the prior in (2.6) with b=(n — j — l)/2 — a — 2, the Bayes factor 
for comparing Mj and Mq can be simplified as 


BF[Mj : Mo] = 


r(j/2 + a + l)r((n - j - l)/2) _ ^2)_(„_j_i)/2+a+i 


r(a+l)r((n-l)/2) 

where is the usual coefficient of the determination of model Mj. 


(2.7) 


The Bayes factor in (2.7) is very attractive for practitioners because of its explicit 
expression without integral representation, which is not available for other choices of 
the hyperparameter b. One may argue that such an expression comes at a certain cost 
on interpreting the role of the prior for g, since this prior depends on both the sample 
size and the model size through the hyperparameter b. It is noteworthy that this type 
of the prior has been studied in the literature. For example, Bayarri et al. [1] propose 
a truncated version of the beta-prime prior for g, such that g > (n -I- l)/(j -I- 3) — 1. A 
similar type of the prior has also been considered by Ley and Steel [13]. 

At this point, we provide several arguments justifying the specification of the hyper¬ 
parameters as follows, (i) The choice of 6 = (n — j — l)/2 — a — 2 yields an implicit 0{n) 
choice of g [16], that is, g = 0(n), which will prevent the hyper-g prior from asymp¬ 
tomatically dominating the likelihood function; (ii) as the sample size grows, the right 
tail of the beta-prime prior behaves like leading to a very fat tail for small val¬ 

ues of a, an attractive property suggested by Gustafson, Hossain and MacNab [9]; (iii) 
with a choice of a = —1/2 and some transformation 6 = (X'X)^/^/3, the prior makes the 
asymptotic tail behavior of 


p{e\a'^)= p{e\a'^,g)Tr{g)dg (2.8) 

Jo 

become the multivariate Cauchy for sufficient large 6 G R2, recommended by Zellner 
[27]; (iv) the resulting Bayes factor in (2.7) enjoys nice theoretical properties and good 
performances in practical applications; see, for example, [16, 22, 23], among others, and 
(v) when the model dimension j is bounded, the Bayes factor in (2.7) is asymptotically 
equivalent to the Schwarz approximation. 


Theorem 2. When the model dimension j is fixed, for large sample sizes n, the Bayes 
factor in (2.7) is equivalent to the Schwarz approximation given by 


BF[Mj : Mq] fv exp 


^logn-^log(l-i?j) 


(2.9) 


Proof. See the Appendix. 


□ 
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One of the most attractive properties in the Bayesian approaches is the model selection 
consistency, which means the true model (assuming it exists) will be selected if enough 
data is provided. This property has been intensively studied under different asymptotic 
scenarios as the sample size approaches infinity. For example, when the model dimension 
is fixed, see [3, 13, 14, 16], to name just a few. Of particular note is that the consis¬ 
tency of various Bayes factors in the listed references behaves very similarly, because 
for sufficiently large values of n, the intrinsic Bayes factor and Bayes factors associated 
with mixtures of g-priors (e.g., g = n and Zellner-Siow prior) can all be approximated 
by the Schwarz approximation in (2.9); see Theorem 2 of [19]. Also, we can show that 
this approximation is valid for the Bayes factor with the hyper-g prior in [14]. 

When the model dimension grows with the sample size, Moreno, Giron and Casella [18] 
study the consistency of the intrinsic Bayes factors for comparing nested models, and a 
generalization of the consistency to nonnested models has been addressed by Giron et al. 
[ 8 ] . More recently, Wang and Sun [22] address the consistency of Bayes factor associated 
with Zellner’s g-prior for nested models, whereas its consistency for the case of nonnested 
models is also of the utmost importance. We shall particularly be interested in comparing 
the asymptotic behaviors between the proposed Bayes factor and the intrinsic Bayes 
factor under the same asymptotic scenario. The presented results provide researchers a 
valuable theoretical base for the comparison among nested and nonnested models, which 
naturally appears in practical situations. 


3. Bayes factor consistency for nonnested linear 
models 

In this section, we consider the model selection consistency of Bayes factor for comparing 
nonnested models under the three asymptotic scenarios. The Bayes factor in (2.7) may 
not be directly applied to the problem of comparing nonnested models, whereas we can 
calculate the Bayes factor between Mj and Mq, JiF[Mj : Mg], and the Bayes factor 
between Mi and Mq, BF[Mi : Mq]. Thereafter, the Bayes factor for comparing Mj and 
Mi can be formulated as 


BF[M,- : Mi] 


WF[Mj : Mo] 
BF[M : Mo] ■ 


(3.1) 


The Bayes factor for comparing Mj and Mi in (1.2) and (1.3) is thus given by 


BF[M, : Mi] 


F(j/2 + a + l)F((n -j- l)/2) (1 - i?2)-(n-j-i)/2+a+i 
F(t/2 -b a -b l)F((n - i - l)/2) (1 - R^^)-{n-^-l)/ 2 +a+l ' 


(3.2) 


Let Mt stand for the true model 


I\dj' : Y — InCi; + ~K.Tf3rp + 6. 
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According to [5], the Bayes factor is said to be consistent when 

plim BF[iVf, : Mj] = oo, 

n—foo 

if Mj is the true model Mt, whereas 


plim BF[Mj : Mi] = 0, 

n—^oo 


if Mi is the true model Mr, where ‘plim’ stands for convergence in probability and the 
probability distribution is the sampling distribution under Mt- For notational simplicity, 
let 


1 X'(I„-H.)X 

Oji — ^ Pj Pj , 

J g-Z J ^ J 


where = Xi(X'Xi)“^Xi with X^ being an n x i submatrix of Xp. According to [8], 
the value of 5ji can be viewed as a pseudo-distance between Mj and Mi , in which the two 
models are not necessarily nested. Such a pseudo-distance has the following properties: 
(i) it is always equal to 0 from any model Mj to itself, that is, Sjj = 0; (ii) if Mi is nested 
in Mj, it is also equal to 0, that is, 6ij = 0, and (hi) for any model Mk, we have Ski > Skj 
if Mi is nested in Mj. To study the model selection consistency, it is usually assumed 
that when the sample size approaches infinity, the limiting value of Sji, denoted by d*j, 
always exists, where 


db = lim —/3' 


1 .X'(I„-H,)Xj 


/3,. 


(3.3) 


In what follows, let \\mn^oo[M]Zn represent the limit in probability of the random 
sequence {Zn '■ under the assumption that we are sampling from model M. We 

present one useful lemma which is critical for deriving the main theorems in this paper, 
and the proof of the lemma is directly from Lemma 1 of [8] and is not shown here for 
simplicity. 


Lemma 1. Suppose that we are interested in comparing two models Mi and Mp with 
dimensions i andp, respectively, where Mi is nested in Mp. As n approaches infinity, both 
i and p grow with n as i = 0(jM^) and p = for 0 < oi < 02 < 1. When sampling 

from the true model Mt, 

(i) */ 0 < ai < 02 < 1, it follows that 


lim [Mt] 

n-f-co^ 


1-Rl \ l + 6tp 

1-rU i + s;s 


(ii) IfO < oi < 02 = 1, it follows that 


lim [Mt] 

n—¥oo 


1-Rl\ 

1-^f J 


^ + dL - 1/r 


1 


■ s* 
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where r = lim„_>oo n/p> 1. 

(iii) // ai = 02 = 1, it follows that 



1 + - lA 

1 + ~ l/s 


where r = lim„^oo n/p > 1 and s = lim„^oo n/i > 1. 

We are now in a position to characterize the consistency of Bayes factor in (3.2) for 
comparing nonnested linear models. We begin with Scenario 1, that is, the dimensions of 
models Mi and Mj are i = 0{n°‘^ ) and j = with 0 < oi < 02 < 1, respectively. The 

following theorem summarizes Bayes factor consistency when either of the two models is 
the true model. 

Theorem 3. Let Mq be the null model nested in both nonnested models Mi and Mj, 
whose dimensions are i and j, respectively. Suppose that i = 0{n°‘^) and j = with 

0 < oi < 02 < 1 and that 5 *j > 0 and S*i > 0. The Bayes factor in (3.2) is consistent 
whichever the true model is. 

Proof. See the Appendix. □ 

Under the same asymptotic scenario, Giron et al. [ 8 ] also conclude that the intrinsic 
Bayes factor is consistent whichever the true model is when 5*j > 0 and 5*i >0. Such an 
agreement of the consistency between the two Bayes factors is due to the fact that the 
dominated term is exactly the same on their asymptotic approximations under Scenario 
1. It is noteworthy that Theorem 3 is also valid for other chosen base model nested in 
both models Mi and Mj, even though the main result of the theorem is derived based 
on the null-based approach. Moreover, Theorem 3 can be directly applied to the case in 
which the dimensions of the two competing models are fixed, because it can be viewed 
as a limiting case with both lim„_>oo^^/j and Vmin^oo'n/i approaching infinity. 

Corollary 1. Suppose we are interested in comparing two models Mi and Mj with di¬ 
mensions i and j, respectively, and that both dimensions are fixed. The Bayes factor in 
(3.2) is consistent under both models provided that S*j > 0 and <5*^ >0. 

We now investigate Bayes factor consistency when the dimension of one of the 
nonnested models is of order 0(n). The main results are provided in the following theo¬ 
rem. 

Theorem 4. Let Mq be the null model nested in both nonnested models Mi and Mj 
whose dimensions are i and j, respectively. Suppose that i = OinS^) and j = with 

0 < ai < 02 = 1 and that there exists a positive constant r such that r = lim„_>oo n)j > 1. 

(a) The Bayes factor in (3.2) is consistent under Mi, provided that S*j > 0. 
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(b) The Bayes factor in (3.2) is consistent under Mj provided that 

S*i e {K{r,S*Q),S*o], 

and S*Q > S{r), where K(r, s) = [r(l + — 1 and 


(3.4) 


(3.5) 


Proof. See the Appendix. 


□ 


Some of the interesting findings can be drawn from the theorem as follows. First, the 
lower bound of i 5 *q, denoted by S{r), is exactly the same as the one in Theorem 2 of [22] 
for comparing nested linear models. Second, Theorem 4 can be extended to the case of 
nested model comparisons (i.e., Mi is nested in Mj) by assuming that Mq = Mi. Third, 
the Bayes factor depends on the choice of the base model through the value of S*q, and 
therefore, to enlarger the consistency region in (3.4), we need to make S*q be as large 
as possible. This justihes that the null model Mq would be the best choice as the base 
model. Fourth, the lower bound of S*i, denoted by K(r,( 5 *q), is a bounded decreasing 
function in r and satishes that for any S*q > 0, 

lim K(r,(5*n) = 0. 

r — 


Finally, under the same scenario, Giron et al. [8] consider the consistency of the intrinsic 
Bayes factor and conclude that the intrinsic Bayes factor is consistent under Mi if 5*j > 0 
and is consistent under Mj , provided that 6*q > f (r) with 




r — 1 

(r + l)G-b/’'- 1 


- 1 , 


(3.6) 


and 


S*,e{v{r,S*o),S*o], 


(3.7) 


where r]{r, s) = - 1. 

It is interesting to observe that the asymptotic behaviors of the two Bayes factors 
depend on the pseudo-distance between models S*i bounded by i5*q. Figure 1 shows 
that the upper bounds of their inconsistency regions tend to each other as r increases. 
Moreover, Figure 2 provides their lower bounds with different values of 6*q. When (5 *q 
is small, the consistency region of the proposed Bayes factor is included by the one of 
the intrinsic Bayes factor, whereas the difference between the two regions is small; see 
Figure 2(a). However, when S*q gets larger, the consistency region of the proposed Bayes 
factor will contain the one of the intrinsic Bayes factor, whereas the difference between 
the two regions becomes significantly as (5 *q increases; see Figure 2(b). Thus, we may 
conclude that as (5 *q increases, the proposed Bayes factor outperforms the intrinsic Bayes 
factor from a theoretical viewpoint. 
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Figure 1. The inconsistency region comparisons (below the curves) for the proposed Bayes 
factor and the intrinsic Bayes factor under Scenario 2. 


It deserves mentioning that the existence of an inconsistency region around the null 
hypothesis is quite reasonable from a practical point of view, because the nontrue smaller 
model Mi is parsimonious under large-p situation and is generally selected when conduct¬ 
ing model selection, if the true larger model Mj is not so distinguishable from Mi. From 
the prediction view of point, Maruyama [15] has demonstrated the reasonability of the 
inconsistency region for the one-way fixed-effect ANOVA model, which could be viewed 
as a special case of the classical linear models in (1.1) after some reparameterization. 



(a) cr*Q = 0.5 


(b) ^*0 = 20 


Figure 2. The lower bounds of the consistency regions in (3.4) and (3.7) with different limiting 
values of 5jo under Scenario 2. 
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A theoretical justification of this line of thought for a more general model is still under 
investigation and will be reported elsewhere. 

The first two theorems mainly focus on the consistency of Bayes factor for the case in 
which at least one model is of order 0(n“) for a < 1. It is worthy of investigating the 
consistency issue for the case where both models are of order Oin): the growth rates of 
the two model dimensions are as fast as n. Such a scenario remians an open problem for 
the intrinsic Bayes factor commented by Giron et al. [8]. We summarize the consistency 
of the proposed Bayes factor under this scenario in the following theorem. 


Theorem 5. Let Mq be the null model nested in both nonnested models Mi and Mj 
with dimensions i = 0(n) and j = 0(n), respectively. Suppose that there exist positive 
constants r and s such that r = lim„_j.oo''T '/j > 1 o,nd s = lim„_>oo "fi/* > 1- Without loss 
of generality, we assume that r < s. 


(a) The Bayes factor in (3.2) is consistent under Mi provided that 




r — \ 


.l/s 




N.l/s-1/r 


• r / (r— 1) 




and that S*q > 0 satisfying 


1 + 






1 — l/r J '' (1/s)^/® 

(b) The Bayes factor in (3.2) is consistent under Mj provided that 

6*i e 

where 


(j){a,b,c) = 


b- 1 


, 1/0 


61/b 


(l + c)i/“-i/^-l 


b/(b-l) 


and that S*q > 0 satisfying 


1 + 


Proof. See the Appendix. 


A* 


1 - 1 ^ 


l-l/s 




* \l/r-l/s 


(3.8) 


(3.9) 


(3.10) 


(3.11) 

□ 


Unlike the first two asymptotic scenarios, Theorem 5(a) shows that under Scenario 3, 
there exists an inconsistency region around the alternative hypothesis when Mi is true 
and that the consistency under Mi depends on the chosen base model Mq through the 
distance (5 *q only. The existence of the inconsistency region is quite reasonable because 
there are many candidates to be the base model, which could have a dimension of order 
0{n°‘^) with ai < 1. In particular, we observe that the inconsistency region disappears 
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for the case in which r = s. This is also very understandable, because with the same 
growth rates, the parsimonious model is typically preferred in terms of model selection. 
Furthermore, it can be easily shown that the inequality in (3.9) and the lower bound of 
the consistency region in (3.8) are both valid for any (5 *q > 0 if indicating that 

for any (5 *q > 0, the inconsistency region disappears whenever s > r > e ~ 2.718. In order 
to enlarger the consistency region in (3.8), we need to choose a base model to maximize 
the distance S*q. Finally, when s tends to infinity, the inconsistency region disappears for 
any djg > 0 and r > 1, which shows that Theorem 5(a) just reduces to Theorem 4(a). 

Theorem 5(b) shows that the consistency region under Mj depends on the chosen base 
model through S*q only. Thus, the base model should be chosen as small as possible to 
maximize the value of (5 *q. Note that when r = s, the inconsistency region disappears 
under Mj. Also, if the rate of growth of Mi is smaller than that of Mj (i.e., s tends to 
infinity), then with hms_>oo = 1, the inequality in (3.11) turns to be 

5;o>ri/('-i)-l = 5(r), (3.12) 

which becomes inequality in (3.5) in Theorem 4, and the lower bound in (3.10) is 


lim (/)(r, s, (5 *q) = lim 


s — 1 


r.l/*' 

„l/s 


( 1 + -1 


1 s/(s-l) 




This illustrates that Theorem 4(b) is just a special of Theorem 5(b) when s approaches 
infinity. We may thus conclude that when s tends to infinity, Theorem 5 reduces to 
Theorem 4. 

We have compared the consistency of the proposed Bayes factor with the one of the 
intrinsic Bayes factor due to [8] under the first two asymptotic scenarios above. A brief 
summary of comparisons between the two Bayes factors is presented in Table 1 . We ob¬ 
serve that the consistency results presented here are similar to the ones for the intrinsic 
Bayes factor studied by Giron et al. [8]. The similarity occurs, mainly because the asymp¬ 
totic behaviors of the two Bayes factors depend on a limiting value of (1 — 7?^)/(l — Af) 
summarized in Lemma 1. The consistency of the intrinsic Bayes factor is still an open 
problem under Scenario 3. We presume that under Scenario 3, the consistency of the 
intrinsic Bayes factor also behaves similarly with the one of the proposed Bayes factor, 
but some further investigation about this presumption is required. 


Table 1. The consistency regions of the Bayes factor in (3.2) and the intrinsic Bayes factor due 
to [8] for different choices of ai and 02 


Rate of divergenceThe proposed Bayes factor The intrinsic Bayes factor 

0 < ai = 02 = 1 Mj-. (5*0 > ip{r) and S*i e {(j>{r,s,5*Q),5*o]Mj: unknown 

0 < ai < 02 = 1 Mj-. (5*0 > 5{r) and (5*^ e (^(r, (5*o),(5jo] Mj: (5*o > ^(r) and 5*^ e {g{r,5jo),5jQ] 

0 < oi < 02 < 1 Mj-. S*j > 0 and Sji > 0 Mj-. (5*j > 0 and Sji > 0 
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4. Application 

It is well known that the ANalysis Of VAriance (ANOVA) models are extremely impor¬ 
tant in exploratory and confirmatory data analysis in various fields, including agriculture, 
biology, ecology, and psychology studies. One major difference between the ANOVA mod¬ 
els and the classical linear model is that the matrix [l„,Xp] does not necessarily have 
full column rank in ANOVA setting. Some constraints are thus required for making the 
model be identifiable. Here, under the sum-to-zero constraint [6], the ANOVA model 
with constraints for uniqueness can be reparameterized into the classical linear model 
without constraints; see [26]. 

As an illustration, Maruyama [15] and Wang and Sun [21] reparameterize the ANOVA 
models with the sum-to-zero constraint into the classical linear model in (1.1). There¬ 
after, based on Zellner’s g-prior with the beta-prime prior for g, they obtain an explicit 
closed-form Bayes factor, which can be treated as a special case of the Bayes factor in 
(2.7). Consequently, the asymptotic results of the proposed Bayes factor can be easily 
applied to various ANOVA models. The application to the one-way ANOVA model is 
straightforward and is thus omitted here for simplicity. In this section, we mainly consider 
the results for the two-way balanced ANOVA model with the same number of observa¬ 
tions per cell. It deserves mentioning that the results can also be generalized to cover the 
unbalanced case. 

Consider a factorial design with two treatment factors A and B having p and q levels, 
respectively, with a total of pq factorial cells. Suppose yiji is the Ith observation in the 
(i,j)th cell defined by the ith level of A and the jth level of B, satisfying the following 


model 


Vijl — A A jdj ^ij £ijl , ^ijl ^ Af (0^ O ) 


(4.1) 


for i = 1,... ,p, j = 1,... ,q, and I = 1,... ,r. The number of parameters is pqr. We shall 
be interested in the following five submodels: 

Mq: No effect of A and no effect of B, that is, = 0, /?j = 0, and 7 ^ = 0 for all i and 

j- 

Ml'. Only effect of A, that is, fij = 0 and = 0 for all i and j. 

M 2 : Only effect of B, that is, ai = 0 and 7 ^ = 0 for all i and j. 

M 3 : The additive model (without interaction), that is, ■jij = 0 for all i and j. 

M 4 : The full model (with interaction). 

By using the sum-to-zero constraint, Maruyama derives an explicit closed-form Bayes 
factor associated with Zellner’s g-prior for the regression coefficients of the reparameter¬ 
ized model (i.e., equation (4.7) of [15]) and the beta-prime distribution for the scaling- 
factor g. Moreover, Maruyama studies the consistency of Bayes factor under different 
asymptotic scenarios. When both p and q approach infinity and r is fixed, Maruyama 
concludes that the Bayes factor is consistent except under the full model M 4 , and that 
when sampling from M 4 , the Bayes factor is consistent only if 



(4.2) 
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where S*^ is equal to the limit of the sum of squares of the differences between the 
coefficients of model Mi and the coefficients of model Mj as n tends to infinity, and 
H (r, c) with positive c is the (unique) positive solution of 

^^i^-(x + l)-c = 0. (4.3) 

r 

Such an inconsistency region occurs due to the model comparison between M 4 and M 3 . 
Of particular note is that when comparing M 4 and M 3 , we are in the case of Theorem 
4 with 02 = 1 and that any null hypothesis will result in a model Mi with a reduced set 
of parameters that will satisfy oi < 02 of Theorem 4. Consequently, when sampling from 
the full model M 4 , the Bayes factor in (3.2) is consistent only if < (5|g and 

+ (4.4) 

When comparing models M 4 and M 3 , the consistency region in (4.4) becomes 

d4*3>[r(l + dro + d2*o + ^4*3)]'^''-l, 


which is equivalent to 


-{Sl3 + l)-{Slo + S;o)=0- (4-5) 

This is exactly coincident with equation (4.3) provided by Maruyama [15]. It deserves 
mentioning that an extension of the results of the preceding section to higher-order 
designs is straightforward. 


5. Concluding remarks 

In this paper, we have investigated the consistency of Bayes factor for nonnested linear 
models for the case in which the model dimension grows with the sample size. It has 
been shown that in some cases, the proposed Bayes factor is consistent whichever the 
true model is, and that in others, the consistency depends on the pseudo-distance between 
the larger model and the base model. Specifically, the pseudo-distance can be used to 
characterize the inconsistency region of Bayes factor. By comparing the consistency issues 
between the proposed Bayes factor and the intrinsic Bayes factor, we observe that the 
asymptotic results presented here are similar to the ones for the intrinsic Bayes factor. It 
would be interesting to see the finite sample performance of the two Bayes factors, which 
is currently under investigation and will be reported elsewhere. 

The consistency of Bayes factor further indicates that besides the three commonly 
used families of hyper-g priors in [14], the beta-prime prior is also a good candidate for 
the scaling factor g in Zellner’s 5 -prior. Such a comment has also been claimed by Wang 
and Sun [22] when studying Bayes factor consistency for nested linear models with a 
growing number of parameters. From a theoretical point of view, we may conclude that 
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like the intrinsic Bayes factor, the proposed Bayes factor should also serve as a powerful 
tool for model selection in the class of normal regression models due to its comparable 
asymptotic performance. 

It is worth investigating the issues of consistency of Bayes factor based on mixtures 
of ^-priors due to [14] under the three asymptotic scenarios. However, in most of the 
developments of the g-priors, the expression of Bayes factor may not have an analytically 
tractable form, and some efficient approximations are required. Standard approximation 
technique, such as Laplace approximation, becomes quite challenging when the number 
of parameters grows with the sample size, because the error in approximations needs 
to be uniformly small over the class of all possible models. Such a situation has also 
been encountered by Berger, Ghosh and Mukhopadhyay [2] when studying the ANOVA 
models. We plan to address these issues in our future work. 

Finally, it deserves mentioning that we mainly address Bayes factor consistency based 
on a special choice of the hyperparameter b in the beta-prime prior, which results in an 
explicit closed-form expression of Bayes factor. In an ongoing project, we investigate the 
effects of b on the consistency of Bayes factor, especially for the case when b does not 
actually depend on n. 


Appendix 


It is well known that the asymptotic approximation of the gamma function, given by 
Stirling’s formula, can be approximated by 


r(7ia;-1-72) ~ V^e 


(A.l) 


when X is sufficiently large. Here, “f « g” is used to indicate that the ratio of the two 
sides approaches one as x tends to infinity, that is. 


lim 


r( 7 ix-|- 72 ) 




= 1 . 


Proof of Theorem 2. When the model dimension is j is bounded and the sample size 
n is large, it follows directly from Stirling’s formula that 




(n-j)/2-l 


and r 


n — 1 




1/2-1 


The Bayes factor in (2.7) is asymptotically equivalent 


bf|m, : M,|. (I _ 

^ ^ y^e-"/2(n/2)"/2-i 


-i/2 


(l-i?2)-”/"«^exp -Jlogn-5log(l-A2) 
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This completed the proof. 


□ 


We now investigate the model selection consistency of Bayes factor in (3.2) under 
the three different asymptotic scenarios mentioned above. For simplicity of notation, let 
Ci represent a finite constant for z = 1,2,...,5 throughout the following proofs. When 
[j/2 + 0 + 1) and (n — j — l)/2 are sufficiently large, it follows directly from Stirling’s 
formula that 


r 


+ 0+1 


y/^e 



i/2+a+l/2 


and 




(n-j)/2-l 


Proof of Theorem 3. Under Scenario 1, * = 0{n°‘^) and j = 0{n°‘'^) with 0 < oi < 02 < 
1 , by using the two approximation equations above, it follows that 


: M,] 


T{j/2 + O + l)r((n -j- l)/2) (1 - i?2)-(n-j-l)/2+a+l 
T[i/2 + O + l)r((n - i - l)/2) (1 - i? 2 )-(n-i-l)/ 2 +a+l 

■j/2+a+ll^^ _ ■■^{n-j)/2+l (1 _ _^2^-(n-t)/2 
+/2+a+l(^ _ j)(n-i)/2+l (-J _ _R2^-(Ti-i)/2 


( 3 \ ( 1 - 

^ {ijnyl'^ \i J V 1 ~ / 


(l_j/„)l-U-(l-i? 2 )-(l-+n) 


_ (1 (l-i?2)-(l-i/n)^ 


t ,/2 


(a) We first show the Bayes factor consistency when the true model is Mi. As n tends 
to infinity, we observe that the dominated term in brackets of equation (A.2) can be 
approximated by 

{1-i/ny-’-/'^ (1 - i?f)-(i-V") ’ 

because of j/n and i/n approaching to zero as n approaches infinity. From Lemma 1(a) 
and the fact that da = 0, we observe that under Mi, it follows 


BF[Mj : M,] = C2 


(ijny!'^ 



/ l-j/n \ 


/1 + Sij 
\ 1 + 


-n/2 


(j/ny/^ / 
{i/ny/'^ \i) 


yi-i/n) 




which approaches zero as 6ij > 0, indicating that the Bayes factor in (3.2) is consistent 
when Mi is true. 
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(b) Consistency under Mj is provided as follows. By using Lemma 1(a), it follows that 
under model Mj, the Bayes factor in (3.2) can be further approximated by 


BF[M, : M,] = C3 




^ (i/n)C2 


ij \ 1 - i/n/V 1 + 


-n/2 



1 - 

1 — ifn 


( 1 - 




because Sjj — 0. It should be noted that as n tends to infinity, the fifth dominated term 
approaches infinity if Sji > 0. Therefore, the Bayes factor also approaches infinity as 
6ji > 0, proving the consistency under Mj. This completes the proof the theorem. □ 


Proof of Theorem 4. Under Scenario 2, i = 0{n°'^) and j = with 0 < ai < 02 = 

1 , by using the two approximation equations above, it follows that 


BF[Mj : PIi\ 


T{j/2 + a + l)r((n - j - l)/2) (1 - i? 2 )-(n-J-l)/ 2 +a+l 
r{i/2 + a+ l)r((n - i - l)/2) (1 - R‘f)-{n-i-l)/ 2 +a+l 

(j'A)°+^ n-j/n \ 

^ \ l — i/n ) 


j y/" (1 i-U" (1 - i?2)-(i-U-)- 


(l-i/n)i-C«' {1-Rf)-F-i/n) 


n /2 


(A.3) 


(a) If the true model is Mj, from Lemma 1(b) and the fact that da = 0, we observe 
that the dominated term in brackets of (A.3) can be approximated by 

j y /" (1 _ jjny-i/^ (1 - i?2)-(i-U") 


(l-i/n)!-*/" (1-i?f)-(i-U") 


l/r 


l/r 




1- 

r ^ 

1 - l/r 

1 -1 /r + y 


l-i?f 

1-1/r 


1 + dii 


iO 


l/r 


Accordingly, the approximation of Bayes factor in (3.2) is given by 


BF[Mj : Mi] ^ C4^ 


(j7*) 


0,-1- 1 


ifln)'-!'^ \r 


l/r 


1 -1/r 


I — l/r + 5i. 


l-l/r 


1 + dio 


Ijr 


r/2 


which approaches zero as n tends to infinity, and therefore, the consistency under Mi is 
proved. 
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(b) If the true model is Mj, from Lemma 1(b) and the fact that Sjj = 0, we observe 
that the dominated term in brackets of (A.3) can be approximated by 

j y/" (1 (i - i?2)-(i-j7n) 

(l-i?f)-(i-i/n) 
l/r / ixl-lA/i e>2\ -1 


‘'"7 ly-'^Y i-i/r yVi-i/ 


l + <5 


l/r 


(1 + 5ji) 


1 


Ijr 


— l/r 
1 + djo 


\jr 


1 + djQ ^ 

Therefore, the Bayes factor in (3.2) under Mj turns out to be 


BF[M, : M,] = cgy 


ij/i) 


CL -\-1 


{ijny/'^ \r 


l/r 


(1 + Sji) 


1 


1 + 5 


70 


l/r 


%I2 


(A.4) 


To show the consistency under Mj, it is sufficient to show that the dominated term in 
brackets of (A.4) is strictly larger than one when n tends to infinity. This is equivalent 
to 

l/r ^ ^ \ 1/'^" 


(1 + Sji) 


l + (5 


70 


> 1 , 


which gives that 

5ji > [r{l + - 1. 

On the other hand, we have Sji < Sjo, which provides that 

djo > Sji > [r(l + Sjo)]^^"' - 1, 

indicating that 

Sjo > — 1 = 5{r). 

In order for the interval where the distance Sji should lie 


Sji G ([r-)! + (5jo)] ^ 


to be nonempty, a necessary and sufficient condition is that Sjo > S{r). This completes 
the proof. 


□ 
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Proof of Theorem 5. Under Scenario 3, i = and j = with oi = 02 = 1, 

by using the two approximations equations, it follows that 


: M,] 


T{j/2 + a+ l)r((n - j - l)/2) (1 - i? 2 )-(n-j-l)/ 2 +a+l 
r{i/2 + a+ l)r((n - i - 1)/2) (1 - i? 2 )-(n-i-l)/ 2 +a+l 



/ l-j/n \ 


{j~ j 1/" (1 — i?j) 




V2 


(A.5) 


(a) If the true model is Mi, from Lemma 1(c) and the fact that Su = 0, we observe 
that the dominated term in brackets of (A.5) can be approximated by 


{ilny/^ (l-i/n)!-*/" (1 - i?f)-(i-*/") 

(i/ryR {1-1/ry-^R / 


(l/s)i/« (l-l/s)i-iA yi-R, 




{1/ryR {1-1/ry-^R f^ + S^J - l/r\“^^”^/’'Yl- 


(A.6) 


(l/s)i/« (l-l/s)i-iA 1-1/s 

{1/ryR [!+%/(!-l/r)]-(i-VU 


1 + (5; 


•iO 


(l/s)i/« 


(1 +<5,0)1/’-!/^ 


For the Bayes factor to be consistent, it is sufficient to show that the dominated term in 
(A.6) is strictly less than 1 as n approaches infinity. This is equivalent to 


1 + 


1- 1/r 




(l/s)iA 


which implies that 


> 


T 


1/s 


S 

j'^l' 




• r / (r— 1) 


-1 


In addition, from the property of the pseudo-distance, we have <5^0 > Therefore, it 
follows that 




r — 1 


A/. 




- r /(r —1) 


- 1 
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indicating that the value of 5ij must satisfy 


1 + 


1 — 1/r 




(l/s)iA 


Under the conditions stated in the theorem, we take limits and obtain that the Bayes 
factor tends to zero, and thus, the Bayes factor is consistent under Mi. 

(b) If the true model is Mj, from Lemma 1(c) and the fact that bjj = 0, we observe 
that the dominated term in brackets of (A.5) can be approximated by 

(j7n)J/" (1 - (1 - 

(i/n)*/" (i_i/n)i-V" (l-i?2)-(i-V’T.) 

(l/r)iA (1 

(A.7) 


(l/s)i/« (l-l/s)i-iA \\-R\ 

(l/r)iA / i-l/r \1 - 1/r 


1 + d 


'to 


(l/s)i/« (l-l/s)i-iA + 

(l/r)iA [l + d^-7(l-l/s)]i-iA 
^ (l/s)iA + ■ 

For the Bayes factor to be consistent, it is sufficient to show that the dominated term in 
(A.7) is strictly larger than one as n approaches infinity. This is equivalent to 


(l/r)iA [1 + 5^.7(1-1/7]1 -iA 


(l/s)iA {l + 5joYR-y^ 


> 1 . 


Simple algebra shows that 


Sji > 


s — 1 


„l/r -|s/(s-l) 

_(1 +4,_i 


On the other hand, we also have Sjo > Sji , which provides that 


indicating that 


SjQ ^ Sji > 


1 + 


s — 1 


1 - 1/s 


r- 1 /' 




s/ 0 - 1 ) 


- 1 


(A.8) 


l-l/s 1/r 


In order for the interval where the distance Sji should lie 


Sji e 


s — 1 






ns/O-i) 


,OjO 
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to be nonempty, a necessary and sufficient condition is that Sjo satisfies inequality in 
(A. 8 ). This completes the proof. □ 
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